AITopics | iris data

Collaborating Authors

iris data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

High-Dimensional Data Classification in Concentric Coordinates

Williams, Alice, Kovalerchuk, Boris

arXiv.org Artificial IntelligenceJul-25-2025

Alice Williams Department of Computer Science Central Washington University USA 0009 - 0001 - 6154 - 2407 Boris Kovalerchuk Department of Computer Science Central Washington University USA 0000 - 0002 - 0995 - 9539 Abstract -- The v isualization of multi - dimensional data with interpretable methods remains limited by ca pabilities for both high - dimensional lossless visualizations that do not suffer from occlusion and that are computationally capable by parameterized visualization . This paper proposes a low to high dimensional data supporting framework using lossless C oncentric C oordinates that a re a more compact generalization of Parallel Coordinate s along with former C ircular C oordinates . These are forms of the General Line Coordinate visualizations that can directly support machine learning algorithm visualization and facilitate human inter action . A. Motivation In many domains, accurate and interpretable classification models can be accurately visualized. However, in many other domains, this remains a long - standing and critical roadblock to deploy artificial intelligence and machine learning (AI/ML) models. This is critica l and challenging for high - risk tasks like healthcare diagnostics. Visualization of multidimensional (n - D) data classification is critical for three major reasons: (1) to speed up analysis of prediction accuracy, (2) to interpret/explain classifier predictions, and (3) to improve/modify the prediction model. B. Overview of Existing Methods AI/ ML tasks for high multi - dimensional (n - D) data are commonly approached with black - box deep - learning (DL) methods that inherently lack in i nterpretability and decision explanation. Further relying on explainability after model design as popularly done with either LIME or SHAP [ 7 ]. Moreover, visualization methods used commonly pre process data with dimensional reduction (DR) methods like Principal Component Analysis (PCA), t - Stochastic Neighbor Embedding (t - SNE), or other similar approximations. However, s uch methods are lossy and not reversible. Therefore, these methods commonly introduce visual ly verify inaccuracies in n - D. Alternatively, lossless visualizations allow for the use of Visual Knowledge Discovery (VKD) to visually discover algorithmic adjustments that improve ML prediction models [ 5 ] .

artificial intelligence, machine learning, visualization, (18 more...)

arXiv.org Artificial Intelligence

2507.1845

Country: North America > United States > Wisconsin (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Extending Explainable Ensemble Trees (E2Tree) to regression contexts

Aria, Massimo, Gnasso, Agostino, Iorio, Carmela, Fokkema, Marjolein

arXiv.org Machine LearningSep-10-2024

Ensemble methods such as random forests have transformed the landscape of supervised learning, offering highly accurate prediction through the aggregation of multiple weak learners. However, despite their effectiveness, these methods often lack transparency, impeding users' comprehension of how RF models arrive at their predictions. Explainable ensemble trees (E2Tree) is a novel methodology for explaining random forests, that provides a graphical representation of the relationship between response variables and predictors. A striking characteristic of E2Tree is that it not only accounts for the effects of predictor variables on the response but also accounts for associations between the predictor variables through the computation and use of dissimilarity measures. The E2Tree methodology was initially proposed for use in classification tasks. In this paper, we extend the methodology to encompass regression contexts. To demonstrate the explanatory power of the proposed algorithm, we illustrate its use on real-world datasets.

displacement 183, displacement 267, e2tree, (16 more...)

arXiv.org Machine Learning

2409.06439

Country:

North America > United States > New York (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine (0.46)
Transportation (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.88)

Add feedback

Synthetic Data Generation and Automated Multidimensional Data Labeling for AI/ML in General and Circular Coordinates

Williams, Alice, Kovalerchuk, Boris

arXiv.org Artificial IntelligenceSep-3-2024

Insufficient amounts of available training data is a critical challenge for both development and deployment of artificial intelligence and machine learning (AI/ML) models. This paper proposes a unified approach to both synthetic data generation (SDG) and automated data labeling (ADL) with a unified SDG-ADL algorithm. SDG-ADL uses multidimensional (n-D) representations of data visualized losslessly with General Line Coordinates (GLCs), relying on reversible GLC properties to visualize n-D data in multiple GLCs. This paper demonstrates use of the new Circular Coordinates in Static and Dynamic forms, used with Parallel Coordinates and Shifted Paired Coordinates, since each GLC exemplifies unique data properties, such as interattribute n-D distributions and outlier detection. The approach is interactively implemented in computer software with the Dynamic Coordinates Visualization system (DCVis). Results with real data are demonstrated in case studies, evaluating impact on classifiers.

circular coordinate, iris data, synthetic data, (14 more...)

arXiv.org Artificial Intelligence

2409.02079

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Washington > Kittitas County > Ellensburg (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

General Line Coordinates in 3D

Martinez, Joshua, Kovalerchuk, Boris

arXiv.org Artificial IntelligenceMar-17-2024

Interpretable interactive visual pattern discovery in lossless 3D visualization is a promising way to advance machine learning. It enables end users who are not data scientists to take control of the model development process as a self-service. It is conducted in 3D General Line Coordinates (GLC) visualization space, which preserves all n-D information in 3D. This paper presents a system which combines three types of GLC: Shifted Paired Coordinates (SPC), Shifted Tripled Coordinates (STC), and General Line Coordinates-Linear (GLC-L) for interactive visual pattern discovery. A transition from 2-D visualization to 3-D visualization allows for a more distinct visual pattern than in 2-D and it also allows for finding the best data viewing positions, which are not available in 2-D. It enables in-depth visual analysis of various class-specific data subsets comprehensible for end users in the original interpretable attributes. Controlling model overgeneralization by end users is an additional benefit of this approach.

cube, plane, visualization, (14 more...)

arXiv.org Artificial Intelligence

2403.13014

Country: North America > United States > Washington > Kittitas County > Ellensburg (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)

Add feedback

Criticality Analysis: Bio-inspired Nonlinear Data Representation

Scheper, Tjeerd V. olde

arXiv.org Artificial IntelligenceMay-11-2023

The representation of arbitrary data in a biological system is one of the most elusive elements of biological information processing. The often logarithmic nature of information in amplitude and frequency presented to biosystems prevents simple encapsulation of the information contained in the input. Criticality Analysis (CA) is a bio-inspired method of information representation within a controlled self-organised critical system that allows scale-free representation. This is based on the concept of a reservoir of dynamic behaviour in which self-similar data will create dynamic nonlinear representations. This unique projection of data preserves the similarity of data within a multidimensional neighbourhood. The input can be reduced dimensionally to a projection output that retains the features of the overall data, yet has much simpler dynamic response. The method depends only on the rate control of chaos applied to the underlying controlled models, that allows the encoding of arbitrary data, and promises optimal encoding of data given biological relevant networks of oscillators. The CA method allows for a biologically relevant encoding mechanism of arbitrary input to biosystems, creating a suitable model for information processing in varying complexity of organisms and scale-free data representation for machine learning.

oscillator, perturbation, representation, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/e25121660

2305.14361

Country: Europe > United Kingdom (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Clustering -- Basic concepts and methods

Kapp-Joswig, Jan-Oliver Felix, Keller, Bettina G.

arXiv.org Artificial IntelligenceDec-1-2022

We review clustering as an analysis tool and the underlying concepts from an introductory perspective. What is clustering and how can clusterings be realised programmatically? How can data be represented and prepared for a clustering task? And how can clustering results be validated? Connectivity-based versus prototype-based approaches are reflected in the context of several popular methods: single-linkage, spectral embedding, k-means, and Gaussian mixtures are discussed as well as the density-based protocols (H)DBSCAN, Jarvis-Patrick, CommonNN, and density-peaks.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2212.01248

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Ventura County > Thousand Oaks (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Shape complexity in cluster analysis

Aguilar, Eduardo J., Barbosa, Valmir C.

arXiv.org Artificial IntelligenceSep-5-2022

In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimension. Like division by the standard deviation, the great majority of scaling techniques can be said to have roots in some sort of statistical take on the data. Here we explore the use of multidimensional shapes of data, aiming to obtain scaling factors for use prior to clustering by some method, like k-means, that makes explicit use of distances between samples. We borrow from the field of cosmology and related areas the recently introduced notion of shape complexity, which in the variant we use is a relatively simple, data-dependent nonlinear function that we show can be used to help with the determination of appropriate scaling factors. Focusing on what might be called "midrange" distances, we formulate a constrained nonlinear programming problem and use it to produce candidate scaling-factor sets that can be sifted on the basis of further considerations of the data, say via expert knowledge. We give results on some iconic data sets, highlighting the strengths and potential weaknesses of the new approach. These results are generally positive across all the data sets used.

artificial intelligence, dimension, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1371/journal.pone.0286312

2205.08046

Country:

North America > United States > Wisconsin (0.05)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Using SingleStoreDB, MindsDB, and Deepnote - DZone Big Data

#artificialintelligenceJul-13-2022, 03:30:35 GMT

This article will show how to use SingleStoreDB with MindsDB using Deepnote. We'll create integrations within Deepnote, load the Iris flower data set into SingleStoreDB, and then use MindsDB to create a Machine Learning (ML) model from the Iris data stored in SingleStoreDB. We'll also make some example predictions using the ML model. Most of the code will be in SQL, enabling developers with solid SQL skills to hit the ground running and start working with ML immediately. The notebook file used in this article is available on GitHub.

deepnote, integration, singlestoredb, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Data Science > Data Mining > Big Data (0.40)

Add feedback

Clustering performance analysis using new correlation based cluster validity indices

Wiroonsri, Nathakhun

arXiv.org Machine LearningSep-23-2021

There are various cluster validity measures used for evaluating clustering results. One of the main objective of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weakness that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal options that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios including the well-known iris data set and a real-world marketing application have been conducted in order to compare the proposed validity indices with several well-known ones.

correlation, optimal number, validity measure, (12 more...)

arXiv.org Machine Learning

2109.11172

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

How to Train a Machine Learning Model in JASP: Clustering - JASP - Free and User-Friendly Statistical Software

#artificialintelligenceFeb-6-2020, 16:47:32 GMT

This is a continuation of our series on machine learning methods that have been implemented in JASP (version 0.11 onwards). In this blog post we train a machine learning model to find clusters within our data set. The goal of a clustering task is to detect structures in the data. To do so, the algorithm needs to (1) identify the number of structures/groups in the data, and (2) figure out how the features are distributed in each group. For instance, clustering can be used to detect subgenres in electronic music, subgroups in a customer database, or to identify areas where there are greater incidences of particular types of crime.

artificial intelligence, centroid, machine learning, (15 more...)

#artificialintelligence

Country:

North America > United States > Indiana > Hamilton County > Fishers (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.72)

Add feedback